Search CORE

37 research outputs found

Towards the Global SentiWordNet

Author: Bandyopadhyay Sivaji
Das Amitava
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Identifying Emotional Expressions, Intensities and Sentence level Emotion Tags using a Supervised Framework

Author: Bandyopadhyay Sivaji
Das Dipankar
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Finding Emotion Holder from Bengali Blog Texts -An Unsupervised Syntactic Approach

Author: Bandyopadhyay Sivaji
Das Dipankar
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

A Multilevel Approach to Sentiment Analysis of Figurative Language in Twitter

Author: Bandyopadhyay Sivaji
Das Dipankar
Gopal Patra Braja
Mazumda Soumadeep
Rosso Paolo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

[EN] Commendable amount of work has been attempted in the field of Sentiment Analysis or Opinion Mining from natural language texts and Twitter texts. One of the main goals in such tasks is to assign polarities (positive or negative) to a piece of text. But, at the same time, one of the important as well as difficult issues is how to assign the degree of positivity or negativity to certain texts. The answer becomes more complex when we perform a similar task on figurative language texts collected from Twitter. Figurative language devices such as irony and sarcasm contain an intentional secondary or extended meaning hidden within the expressions. In this paper we present a novel approach to identify the degree of the sentiment (fine grained in an 11-point scale) for the figurative language texts. We used several semantic features such as sentiment and intensifiers as well as we introduced sentiment abruptness, which measures the variation of sentiment from positive to negative or vice versa. We trained our systems at multiple levels to achieve the maximum cosine similarity of 0.823 and minimum mean square error of 2.170.The work reported in this paper is supported by a grant from the project “CLIA System Phase II” funded by Department of Electronics and Information Technology (DeitY), Ministry of Communications and Information Technology (MCIT), Government of India. The work of the fourth author is also supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the grant ALMAPATER (PrometeoII/2014/030).Gopal Patra, B.; Mazumda, S.; Das, D.; Rosso, P.; Bandyopadhyay, S. (2018). A Multilevel Approach to Sentiment Analysis of Figurative Language in Twitter. Lecture Notes in Computer Science. 9624:281-291. https://doi.org/10.1007/978-3-319-75487-1_22S2812919624Ghosh, A., Li, G., Veale, T., Rosso, P., Shutova, E., Reyes, A., Barnden, J.: Semeval-2015 task 11: sentiment analysis of figurative language in Twitter. In: 9th International Workshop on Semantic Evaluation (SemEval), Co-located with NAACL, Denver, Colorado, pp. 470–478. Association for Computational Linguistics (2015)Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in Twitter. Lang. Resour. Eval. 47(1), 239–268 (2013)Reyes, A., Rosso, P., Buscaldi, D.: From humor recognition to irony detection: the figurative language of social media. Data Knowl. Eng. 74, 1–12 (2012)Patra, B.G., Mandal, S., Das, D., Bandyopadhyay, S.: JU_CSE: a conditional random field (CRF) based approach to aspect based sentiment analysis. In: 8th International Workshop on Semantic Evaluation (SemEval), Co-located with COLING, Dublin, Ireland, pp. 370–374. Association for Computational Linguistics (2014)Ozdemir, C., Bergler, S.: CLaC-SentiPipe: SemEval2015 subtasks 10 B, E, and task 11. In: 9th International Workshop on Semantic Evaluation (SemEval), Co-located with NAACL, Denver, Colorado, pp. 479–485. Association for Computational Linguistics (2015)Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: 4th International Conference on Language Resources and Evaluation, pp. 1083–1086 (2004)Léger, J.C.: Menger curvature and rectifiability. Ann. Math. 149, 831–869 (1999)Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 18th International Conference on Machine Learning, pp. 282–289 (2001)de Albornoz, J.C., Plaza, L., Gervas, P.: SentiSense: an easily scalable concept-based affective lexicon for sentiment analysis. In: 8th International Conference on Language Resources and Evaluation, pp. 3562–3567 (2012)Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Bad news travel fast: a content-based analysis of interestingness on Twitter. In: 3rd International Web Science Conference. ACM (2011)Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: NAACL. Association for Computational Linguistics (2013)Mohammad, S., Turney, P.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: 7th Conference on International Language Resources and Evaluation, Valletta, Malta (2010)Choi, Y., Wiebe, J.: +/

-

EffectWordNet: sense-level lexicon acquisition for opinion inference. In: EMNLP (2014)Whissell, C., Fournier, M., Pelland, R., Weir, D., Makarec, K.: A dictionary of affect in language: IV. Reliability, validity, and applications. Percept. Mot. Skills 62(3), 875–888 (1986)Patra, B.G., Takamura, H., Das, D., Okumura, M., Bandyopadhyay, S.: Construction of emotional lexicon using potts model. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 674–679 (2013)Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)Vilares, D., Alonso, M.A., Gomez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages. J. Assoc. Inf. Sci. Technol. 66(9), 1799–1816 (2015)Barbieri, F., Ronzano, F., Saggion, H.: UPF-taln: SemEval 2015 tasks 10 and 11 sentiment analysis of literal and figurative language in Twitter. In: SemEval-2015, pp. 704–708 (2015

RiuNet

MSIR@FIRE: A Comprehensive Report from 2013 to 2016

Author: Bandyopadhyay Sivaji
Banerjee Somnath
Chakma Kunal
Choudhury Monojit
Das Amitava
Kumar Naskar Sudip
Rosso Paolo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

[EN] India is a nation of geographical and cultural diversity where over 1600 dialects are spoken by the people. With the technological advancement, penetration of the internet and cheaper access to mobile data, India has recently seen a sudden growth of internet users. These Indian internet users generate contents either in English or in other vernacular Indian languages. To develop technological solutions for the contents generated by the Indian users using the Indian languages, the Forum for Information Retrieval Evaluation (FIRE) was established and held for the first time in 2008. Although Indian languages are written using indigenous scripts, often websites and user-generated content (such as tweets and blogs) in these Indian languages are written using Roman script due to various socio-cultural and technological reasons. A challenge that search engines face while processing transliterated queries and documents is that of extensive spelling variation. MSIR track was first introduced in 2013 at FIRE and the aim of MSIR was to systematically formalize several research problems that one must solve to tackle the code mixing in Web search for users of many languages around the world, develop related data sets, test benches and most importantly, build a research community focusing on this important problem that has received very little attention. This document is a comprehensive report on the 4 years of MSIR track evaluated at FIRE between 2013 and 2016.Somnath Banerjee and Sudip Kumar Naskar are supported by Media Lab Asia, MeitY, Government of India, under the Visvesvaraya PhD Scheme for Electronics & IT. The work of Paolo Rosso was partially supported by the MISMIS research project PGC2018-096212-B-C31 funded by the Spanish MICINN.Banerjee, S.; Choudhury, M.; Chakma, K.; Kumar Naskar, S.; Das, A.; Bandyopadhyay, S.; Rosso, P. (2020). MSIR@FIRE: A Comprehensive Report from 2013 to 2016. SN Computer Science. 1(55):1-15. https://doi.org/10.1007/s42979-019-0058-0S115155Ahmed UZ, Bali K, Choudhury M, Sowmya VB. Challenges in designing input method editors for Indian languages: the role of word-origin and context. In: Advances in text input methods (WTIM 2011). 2011. pp. 1–9Banerjee S, Chakma K, Naskar SK, Das A, Rosso P, Bandyopadhyay S, Choudhury M. Overview of the mixed script information retrieval (MSIR) at fire-2016. In: Forum for information retrieval evaluation. Springer; 2016. pp. 39–49.Banerjee S, Kuila A, Roy A, Naskar SK, Rosso P, Bandyopadhyay S. A hybrid approach for transliterated word-level language identification: CRF with post-processing heuristics. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 54–59.Banerjee S, Naskar S, Rosso P, Bandyopadhyay S. Code mixed cross script factoid question classification—a deep learning approach. J Intell Fuzzy Syst. 2018;34(5):2959–69.Banerjee S, Naskar SK, Rosso P, Bandyopadhyay S. The first cross-script code-mixed question answering corpus. In: Proceedings of the workshop on modeling, learning and mining for cross/multilinguality (MultiLingMine 2016), co-located with the 38th European Conference on Information Retrieval (ECIR). 2016.Banerjee S, Naskar SK, Rosso P, Bandyopadhyay S. Named entity recognition on code-mixed cross-script social media content. Comput Sistemas. 2017;21(4):681–92.Barman U, Das A, Wagner J, Foster J. Code mixing: a challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching. 2014. pp. 13–23.Bhardwaj P, Pakray P, Bajpeyee V, Taneja A. Information retrieval on code-mixed Hindi–English tweets. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.Bhargava R, Khandelwal S, Bhatia A, Sharmai Y. Modeling classifier for code mixed cross script questions. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.Bhattacharjee D, Bhattacharya, P. Ensemble classifier based approach for code-mixed cross-script question classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.Chakma K, Das A. CMIR: a corpus for evaluation of code mixed information retrieval of Hindi–English tweets. In: The 17th international conference on intelligent text processing and computational linguistics (CICLING). 2016.Choudhury M, Chittaranjan G, Gupta P, Das A. Overview of fire 2014 track on transliterated search. Proceedings of FIRE. 2014. pp. 68–89.Ganguly D, Pal S, Jones GJ. Dcu@fire-2014: fuzzy queries with rule-based normalization for mixed script information retrieval. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 80–85.Gella S, Sharma J, Bali K. Query word labeling and back transliteration for Indian languages: shared task system description. FIRE Working Notes. 2013;3.Gupta DK, Kumar S, Ekbal A. Machine learning approach for language identification and transliteration. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 60–64.Gupta P, Bali K, Banchs RE, Choudhury M, Rosso P. Query expansion for mixed-script information retrieval. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, ACM, 2014. pp. 677–686.Gupta P, Rosso P, Banchs RE. Encoding transliteration variation through dimensionality reduction: fire shared task on transliterated search. In: Fifth forum for information retrieval evaluation. 2013.HB Barathi Ganesh, M Anand Kumar, KP Soman. Distributional semantic representation for information retrieval. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.HB Barathi Ganesh, M Anand Kumar, KP Soman. Distributional semantic representation for text classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst. 2002;20:422–46. https://doi.org/10.1145/582415.582418.Joshi H, Bhatt A, Patel H. Transliterated search using syllabification approach. In: Forum for information retrieval evaluation. 2013.King B, Abney S. Labeling the languages of words in mixed-language documents using weakly supervised methods. In: Proceedings of NAACL-HLT, 2013. pp. 1110–1119.Londhe N, Srihari RK. Exploiting named entity mentions towards code mixed IR: working notes for the UB system submission for MSIR@FIRE’16. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.Anand Kumar M, Soman KP. Amrita-CEN@MSIR-FIRE2016: Code-mixed question classification using BoWs and RNN embeddings. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.Majumder G, Pakray P. NLP-NITMZ@MSIR 2016 system for code-mixed cross-script question classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.Mandal S, Banerjee S, Naskar SK, Rosso P, Bandyopadhyay S. Adaptive voting in multiple classifier systems for word level language identification. In: FIRE workshops, 2015. pp. 47–50.Mukherjee A, Ravi A , Datta K. Mixed-script query labelling using supervised learning and ad hoc retrieval using sub word indexing. In: Proceedings of the Forum for Information Retrieval Evaluation, Bangalore, India, 2014.Pakray P, Bhaskar P. Transliterated search system for Indian languages. In: Pre-proceedings of the 5th FIRE-2013 workshop, forum for information retrieval evaluation (FIRE). 2013.Patel S, Desai V. Liga and syllabification approach for language identification and back transliteration: a shared task report by da-iict. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 43–47.Prabhakar DK, Pal S. Ism@fire-2013 shared task on transliterated search. In: Post-Proceedings of the 4th and 5th workshops of the forum for information retrieval evaluation, ACM, 2013. p. 17.Prabhakar DK, Pal S. Ism@ fire-2015: mixed script information retrieval. In: FIRE workshops. 2015. pp. 55–58.Prakash A, Saha SK. A relevance feedback based approach for mixed script transliterated text search: shared task report by bit Mesra. In: Proceedings of the Forum for Information Retrieval Evaluation, Bangalore, India, 2014.Raj A, Karfa S. A list-searching based approach for language identification in bilingual text: shared task report by asterisk. In: Working notes of the shared task on transliterated search at forum for information retrieval evaluation FIRE’14. 2014.Roy RS, Choudhury M, Majumder P, Agarwal K. Overview of the fire 2013 track on transliterated search. In: Post-proceedings of the 4th and 5th workshops of the forum for information retrieval evaluation, ACM, 2013. p. 4.Saini A. Code mixed cross script question classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill, Inc.; 1986.Sequiera R, Choudhury M, Gupta P, Rosso P, Kumar S, Banerjee S, Naskar SK, Bandyopadhyay S, Chittaranjan G, Das A, et al. Overview of fire-2015 shared task on mixed script information retrieval. FIRE Workshops. 2015;1587:19–25.Singh S, M Anand Kumar, KP Soman. CEN@Amrita: information retrieval on code mixed Hindi–English tweets using vector space models. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.Sinha N, Srinivasa G. Hindi–English language identification, named entity recognition and back transliteration: shared task system description. In: Working notes os shared task on transliterated search at forum for information retrieval evaluation FIRE’14. 2014.Voorhees EM, Tice DM. The TREC-8 question answering track evaluation. In: TREC-8, 1999. pp. 83–105.Vyas Y, Gella S, Sharma J, Bali K, Choudhury M. Pos tagging of English–Hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. pp. 974–979

RiuNet